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Abstract 

One of the major successes in computational biology has been the unification, using the graphi- 
cal model formalism, of a multitude of algorithms for annotating and comparing biological sequences. 
Graphical models that have been applied towards these problems include hidden Markov models for 
annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single al- 
gorithm, the sum-product algorithm, solves many of the inference problems associated with different 
statistical models. This paper introduces the polytope propagation algorithm for computing the Newton 
polytope of an observation from a graphical model. This algorithm is a geometric version of the sum- 
product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference 
calculations for graphical models. 

1 Inference with Graphical Models for Biological Sequence Analysis 

This paper develops a new algorithm for graphical models based on the mathematical foundation for statis- 
tical models proposed in fTHI . Its relevance for computational biology can be summarized as follows: 

(a) Graphical models are a unifying statistical framework for biological sequence analysis. 

(b) Parametric inference is important for obtaining biologically meaningful results. 

(c) The polytope propagation algorithm solves the parametric inference problem. 

Thesis (a) states that graphical models are good models for biological sequences. This emerging un- 
derstanding is the result of practical success with probabilistic algorithms, and also the observation that 
inference algorithms for graphical models subsume many apparently non-statistical methods. A noteworthy 
example of the latter is the explanation of classic alignment algorithms such as Needleman-Wunsch and 
Smith- Waterman in terms of the Viterbi algorithm for pair hidden Mai^kov models Q. Graphical models are 
now used for many problems including motif detection, gene finding, alignment, phytogeny reconstruction 
and protein structure prediction. For example, most gene prediction methods are now hidden Markov model 
(HMM) based, and previously non-probabilistic methods now have HMM based re-implementations. 

In typical applications, biological sequences are modeled as observed random variables J^i , . . . , F„ in a 
graphical model. The observed random variables may correspond to sequence elements such as nucleotides 
or amino acids. Hidden random variables X\,.. . ,X,„ encode information of interest that is unknown, but 
which one would like to infer For example, the information could be an annotation, alignment or ances- 
tral sequence in a phylogenetic tree. One of the strengths of graphical models is that by virtue of being 
probabilistic, they can be combined into powerful models where the hidden variables ai^e more complex. 
For example, hidden Markov models can be combined with pair hidden Markov models to simultaneously 
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align and annotate sequences One of the drawbacks of such approaches is that the models have more 
parameters and as a result inferences could be less robust. 

For a fixed observed sequence 0i02---0n and fixed parameters, the standard inference problems are: 

1. the calculation of marginal probabilities: 

Poi -on = Pi"ob(Xi =/ii,...,X,„ = /j,„,7i =ai,...,F„ = a„) 

hi,...,h„, 

1. the calculation of maximum a posteriori log probabilities: 

5ci-0„ = min -log (Prob(Xi =hi,... = h,n,Yi = ai, . . . ,7„ = a„)) , 

hi,...,h,„ 

where the /i, range over all the possible assignments for the hidden random variables Xj. In practice, it is the 
solution to Problem 2 that is of interest, since it is the one that solves the problem of finding the genes in a 
genome or the "best" alignment for a pair of sequences. A shortcoming of this approach is that the solution 
h = [hi, . . . ,h,„) may vary considerably with a change in parameters. 

Thesis (b) suggests that a parametric solution to the inference problem can help in ascertaining the 
reliability, robustness and biological meaning of an inference result. By parametric inference we mean the 
solution of Problem 2 for all model parameters simultaneously. In this way we can decide if a solution 
obtained for particular parameters is an artifact or is largely independent of the chosen parameters. This 
approach has already been applied successfully to the problem of pairwise sequence alignment in which 
parameter choices aie known to be crucial in obtaining good alignments |5lEll24J. Our aim is to develop 
this approach for arbitrary graphical models. In thesis (c) we claim that the polytope propagation algorithm 
is efficient for solving the parametric inference problem, and, in certain cases is not much slower than 
solving Problem 2 for fixed parameters. The algorithm is a geometric version of the sum-product algorithm, 
which is the standard tool for inference with graphical models. 

The mathematical setting for understanding the polytope propagation algorithm is tropical geometry. 
The connection between tropical geometry and parametric inference in statistical models is developed in the 
companion paper lilSil . Here we describe the details of the polytope propagation algorithm (Section 3) in two 
familiar settings: the hidden Markov model for annotation (Section 2) and the pair hidden Markov model 
for alignment (Section 4). Finally, in Section 5, we discuss some practical aspects of parametric inference, 
such as specializing parameters, the construction of single cones which eliminates the need for identifying 
all possible maximum a posteriori explanations, and the relevance of our findings to Bayesian computations. 

2 Parametric Inference with Hidden Markov Models 

Hidden Markov models play a central role in sequence analysis, where they are widely used to annotate DNA 
sequences (2|. A simple example is the CpG island annotation problem j4l §3]. CpG sites are locations in 
DNA sequences where the nucleotide cytosine (C) is situated next to a guanine (G) nucleotide (the "p" comes 
from the fact that a phosphate links them together). There are regions with many CpG sites in eukaryotic 
genomes, and these are of interest because of the action of DNA methyltransferase, which recognizes CpG 
sites and converts the cytosine into 5-methylcytosine. Spontaneous deamination causes the 5-methylcytosine 
to be converted into thymine (T), and the mutation is not fixed by DNA repair mechanisms. This results in 
a gradual erosion of CpG sites in the genome. CpG islands ai^e regions of DNA with many unmethylated 
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CpG sites. Spontaneous deamination of cytosine to thymine in these sites is repaired, resulting in a restored 
CpG site. The computational identification of CpG islands is important, because they are associated with 
promoter regions of genes, and are known to be involved in gene silencing. 

Unfortunately, there is no sequence characterization of CpG islands. A generally accepted definition due 
to Gardiner-Garden and Frommer (8) is that a CpG island is a region of DNA at least 200bp long with a G+C 
content of at least 50%, and with a ratio of observed to expected CpG sites of at least 0.6. This arbitrary 
definition has since been refined (e.g. |23||), however even analysis of the complete sequence of the human 
genome fT6l has failed to reveal precise criteria for what constitutes a CpG island. Hidden Markov models 
can be used to predict CpG islands |4l §3]. We have selected this application of HMMs in order to illustrate 
our approach to parametric inference in a mathematically simple setting. 

The CpG island HMM we consider has n hidden binary random variables Xj, and n observed random 
variables F,- that take on the values {A,C,G,r} (see Figure 1 in USD . In general, an HMM can be charac- 
terized by the following conditional independence statements for / = 1, . . . 

p(X,-|Xi,X2,...,X,-_i) = p{Xi\Xi^,), 
piJi\Xu...,XiJu..-Ji-i) = p{Yi\Xi). 
The CpG island HMM has twelve model parameters, namely, the entries of the transition matrices 

and T = '"^ . 

\hA he he hr J 

Here the hidden state space has just two states non-CpG = and CpG = 1 with transitions allowed between 
them, but in more complicated applications, such as gene finding, the state space is used to model numerous 
gene components (such as introns and exons) and the sparsity pattern of the matrix S is crucial. In its 
algebraic representation (TEl §2], the HMM is given as the image of the polynomial map 

iS,T) ^ thiaii^hih2fh2aijh2h---i^h„-ihjh„a,r (1) 

/!e{0,l}" 

The inference problem 1 asks for an evaluation of one coordinate polynomial fa of the map /. This can be 
done in linear time (in n) using the forward algorithm 1131 . which recursively evaluates the formula 

1/1 1 1 X 

fa = ^''«o„ ( XI ^h„^ihJh„-ia„-i ■■■{Yj ^inlnSh^a.i Yj ^hihi^hiOi)) ■■ j (2) 

h„=0 \/j„_i=0 /i2=0 hi=0 J 

Problem 2 is to identify the largest term in the expansion of /o. Equivalently, if we write Uij = — log(5,y) 
and Vjj = —\og{tij) then Problem 2 is to evaluate the piecewise-linear function 

go = min/,„ v/,„a„ + (min/,„_, /j„ + Vft„_ia„_, H h (min/^^v/j^/,, + Uh^c. + (min/j; uu^h. + v/jioi ))•••)• 

This formula can be efficiently evaluated by recursively computing the parenthesized expressions. This is 
known as the Viterbi algorithm in the HMM literature. The Viterbi and forward algorithms are instances of 
the more general sum-product algorithm ll4l. 

What we are proposing in this paper is to compute the collection of cones in R'^ on which the piecewise- 
linear function g^ is linear. This may be feasible because the number of cones grows polynomially in n. Each 
cone is indexed by a binary sequence h G {0, 1}" which represents the CpG islands found for any system of 
parameters {uij,Vij) in that cone. A binary sequence which arises in this manner is an explanation for a in 
the sense of fTEl §4]. Our results in fTE\ imply that the number of explanations scales polynomially with n. 
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Theorem 1. For any given DNA sequence a of length n, the number of bit strings h € {0, 1}" which are 
explanations for the sequence O in the CpG island HMM is bounded above by a constant times n^-^^. 

Proof. There are a total of 2 -4 + 4 = 12 parameters which is the dimension of the ambient space. Note, 
however, that for a fixed observed sequence the number of times the observation A is made is fixed, and 
similarly for C, G, T. Furthermore, the total number of transitions in the hidden states must equal n. Together, 
these constraints remove five degrees of freedom. We can thus apply fTSl Theorem 7] with <i = 12 — 5 = 7. 
This shows that the total number of vertices of the Newton polytope of fa is C?(?i"8") = □ 




Figure 1 : The Schlegel diagram of the Newton polytope of an observation in the CpG island HMM. 

We explain the biological meaning of our parametric analysis with a very small example. Let us consider 
the following special case of the CpG island HMM. First, assume that tiA = f,T and that tic = tic, i.e., the 
output probability depends only on whether the nucleotide is a purine or pyrimidine. Furthermore, assume 
that ?0A = foG> which means that the probability of emitting a purine or a pyrimidine in the non-CpG island 
state is equal (i.e. base composition is uniform in non-CpG islands). 

Suppose that the observed sequence is a = AATAGCGG. We ask for all the possible explanations for 
a, that is, for all possible maximum a posteriori CpG island annotations for all parameters. A priori, the 
number of explanations is bounded by 2^ = 256, the total number of binary strings of length eight. However, 
of the 256 binary strings, only 25 ai^e explanations. Figure 1 is a geometric representation of the solution 
to this problem: the Newton polytope of is a 4-dimensional polytope with 25 vertices. The figure is a 
Schlegel diagram of this polytope. It was drawn with the software POLYMAKE |9jn0l. The 25 vertices 
in Figure 1 correspond to the 25 annotations, which are the explanations for a as the parameters vary. Two 
annotations are connected by an edge if and only if their parameter cones share a wall. From this geometric 
representation, we can determine all pai"ameters which result in the same maximum a posteriori prediction. 
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3 Polytope Propagation 

The evaluation of ga for fixed parameters using tiie formulation in Q is known as the Viterbi algorithm in 
the HMM literature. We begin by re-interpreting this algorithm as a convex optimization problem. 

Definition 2. The Newton polytope of a polynomial 

n 

r/ \ "1,1 "2./ "rf,; 

J[Xl,...,X^) — ^C/'JCj X2 '''X^ 

1=1 

is defined to be the convex hull of the lattice points in corresponding to the monomials in f: 
NP{f) = conv{{ai,i,a2^,...,ad,i),--- ,iai,n,a2,r,, ■ ■ ■ ,ad^„)}. 

Recall that for a fixed observation there are natural polynomials associated with a graphical model, 
which we have been denoting by fc. In the CpG island example from Section 2, these polynomials are the 
coordinates fc of the polynomial map / in Q. Each coordinate polynomial f^ is the sum of 2" monomials, 
where « = |a|. The crucial observation is that even though the number of monomials grows exponentially 
with n, the number of vertices of the Newton polytope NP{fa) is much smaller. The Newton polytope is 
important for us because its vertices represent the solutions to the inference problem 2. 

Proposition 3. The maximum a posteriori log probabilities 60 in Problem 2 can be determined by minimiz- 
ing a linear functional over the Newton polytope of f^. 

Proof. This is nothing but a restatement of the fact that when passing to logarithms, monomials in the 
parameters become linear functions in the logarithms of the parameters. □ 

Our main result in this section is an algorithm which we state in the form of a theorem. 

Theorem 4 (Polytope propagation). Let f^ be the polynomial associated to a fixed observation a from 
a graphical model. The list of all vertices of the Newton polytope of f^ can be computed efficiently by 
recursive convex hull and Minkowski sum computations on unions of polytopes. 

Proof. Observe that if /i,/2 are polynomials then AfP(/i • 72) = NP{f\) +NP{f2) where the + on the right 
hand side denotes the Minkowski sum of the two polytopes. Similarly, NP{fi + fi) = com(NP{fi) U 
NP{f2)) if /i and /2 ai^e polynomials with positive coefficients. The recursive description of /o given in (EJi 
can be used to evaluate the Newton polytope efficiently. The necessary geometric primitives are precisely 
Minkowski sum and convex hull of unions of convex polytopes. These primitives run in polynomial time 
since the dimension of the polytopes is fixed. This is the case in our situation since we consider graphical 
models with a fixed number of parameters. We can hence run the sum-product algorithm efficiently in the 
semiring known as the polytope algebra. The size of the output scales polynomiaUy by (TEl Thm. 7]. □ 

Figure 2 shows an example of the polytope propagation algorithm for a hidden Markov model with all 
random variables binary and with the following transition and output matrices: 




Here we specialized to only two parameters in order to simplify the diagram. When we run polytope propa- 
gation for long enough DNA sequences a in the CpG island HMM of Section 2 with all 12 free parameters, 
we get a diagram just like Figure 2, but with each polygon replaced by a seven-dimensional polytope. 
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Figure 2: Graphical representation of the polytope propagation algorithm for a hidden Markov model. For 
a particular pair of parameters, there is one optimal Viterbi path (shown as large vertices on the polytopes). 

It is useful to note that for HMMs, the Minkowski sum operations ai^e simply shifts of the polytopes, 
and therefore the only non-trivial geometric operations required are the convex hulls of unions of polytopes. 
The polytope in Figure 1 was computed using polytope propagation. This polytope has dimension 4 (rather 
than 7) because the sequence a = AATAGCGG is so short. We wish to emphasize that the small size of our 
examples is only for clarity; there is no practical or theoretical banier to computing much larger instances. 

For general graphical models, the running time of the Minkowski sum and convex hull computations 
depends on the number of parameters, and the number of vertices in each computation. These are clearly 
bounded by the total number of vertices of NP{fa), which are bounded above by I18t Theorem 7]: 

#vertices(AfP(/c)) < constant •£'''(''"^)/(''+') < constant 

Here E is the number of edges in the graphical model (often linear in the number of vertices of the model). 
The dimension d of the Newton polytope NP{fa) is fixed because it is bounded above by the number of 
model parameters. The total running time of the polytope propagation algorithm can then be estimated 
by multiplying the running time for the geometric operations of Minkowski sum and convex hull with the 
running time of the sum-product algorithm. In any case, the running time scales polynomially in E. 

We have shown in fTEl §4] that the vertices of NP{fa) correspond to explanations for the observation a. 
In parametric inference we are interested in identifying the parameter regions that lead to the same expla- 
nations. Since parameters can be identified with linear- functionals, it is the case that the set of parameters 
that lead to the same explanation (i.e. a vertex v) are those linear functionals that minimize on v. The set of 
these linear functionals is the normal cone ofNP{fa) at v. The collection of all normal cones at the various 
vertices v forms the normal fan of the polytope. Putting this together with Proposition |2] we obtain: 

Proposition 5. The normal fan of the Newton polytope off,; solves the parametric inference problem for an 
observation a in a graphical model. It is computed using the polytope propagation algorithm. 
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Figure 3: A pair hidden Markov model for sequence alignment. 



An implementation of polytope propagation for arbitrary graphical models is currently being developed 
within the geometry software package POLYMAKE fQinOj by Michael Joswig. 



4 Parametric Sequence Alignment 

The sequence alignment problem asks to find the best alignment between two sequences which have evolved 
from a common ancestor via a series of mutations, insertions and deletions. Formally, given two sequences 



= and 



•a,,, over the alphabet {0, 1, 1}, an alignment is a string over 



the alphabet {M,/,D} such that #M + #D = n and #M + #I = m. Here #M,#I,#D denote the number of 
characters M,I,D in the word respectively. An alignment records the "edit steps" from the sequence to 
the sequence o^, where edit operations consist of changing characters, preserving them, or inserting/deleting 
them. An / in the alignment string corresponds to an insertion in the first sequence, a D is a deletion in the 
first sequence, and an M is either a character change, or lack thereof. We write Jin.m for the set of all 
alignments. For a given h G .^„,.„, we will denote the 7th character in h by hj, we write h\i] for #M + #I in 
the prefix /zi/j2 -/j,-, and we write h{j) for #M + #D in the prefix h\h2...hj. The cardinality of the set Jl„,„ 
of all aUgnments can be computed as the coefficient of in the generating function 1/(1 —x — y — xy). 
These coefficients ai^e known as Delannoy numbers in combinatorics lIlTI §6.3]. 

Bayesian multi-nets were introduced in and are extensions of graphical models via the introduction of 
class nodes, and a set of local networks corresponding to values of the class nodes. In other words, the value 
of a random variable can change the structure of the graph underlying the graphical model. The pair hidden 
Markov model (see Figure|3ll is an instance of a Bayesian multinet. In this model, the hidden states (unshaded 
nodes forming the chain) take on one of the values M,I,D. Depending on the value at a hidden node, either 
one or two characters are generated; this is encoded by plates (squares around the observed states) and class 
nodes (unshaded nodes in the plates). The class nodes take on the values or 1 corresponding to whether or 
not a character is generated. Pair hidden Markov models are therefore probabilistic models of alignments, 
in which the structure of the model depends on the assignments to the hidden states. 

Our next result gives the precise description of the pair HMM for sequence alignment in the language 
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of algebraic statistics, namely, we represent this model by means of a polynomial map /. Let a*, be the 
output strings from a pair hidden Mai^kov model (of lengths n^m respectively). Then: 

\h\ 

heR„,,„ i=2 

where is the transition probability from state to hi and f/i,(c^^[,];<7^(,p are the output probabilities 
for a given state hi and the corresponding output characters on the strings 0^,0^. 

Proposition 6. The pair hidden Markov model for sequence alignment is the image of a polynomial map 
f : r9+2/+'" ^ r'" "' j/jg coordinates of f are polynomials of degree n + m+\ in 0. 

We need to explain why the number of parameters is 9 + 2/ + 1^. First, there are nine parameters 

(^MM SmI Smd\ 
Sim sn sid , 
sdm sqi Sdd J 

which play the same role as in Section 2, namely, they represent transition probabilities in the Markov chain. 
There are parameters tM{a,b) =: tMab for the probability that letter a in a' is matched with letter b in o^. 
The insertion parameters tj{a,b) depend only on the letter b, and the deletion parameters tD{a,b) depend 
only on the letter a, so there ai^e only 21 of these parameters. In the upcoming example, which explains the 
algebraic representation of Proposition |6l we use the abbreviations tjh and tpa for these parameters. 

Consider two sequences a' = ij and = klm of length n = 2 and m = 3 over any alphabet. The 
number of alignments is #{An.m) = 25, and they are listed in Table 1. The polynomial f^^i ^2 is the sum 
of the 25 monomials (of degree 9,7,5) in the rightmost column. For instance, if we consider strings over 
the binary alphabet {0, 1}, then there are 17 parameters (nine ^-parameters and eight ^-parameters), and the 
pair HMM for alignment is the image of a map / : R^^ — > R-''^. The coordinate of / which is indexed by 
{i,j,k,l,m) G {0, 1}^ equals the 25-term polynomial gotten by summing the rightmost column in Table 1. 

The pai^ametric inference problem for sequence alignment is solved by computing the Newton polytopes 
NP{fcj.a2) with the polytope propagation algorithm. In the terminology introduced in 1181 §4], an obser- 
vation a in the pair HMM is the pair of sequences (01,02), and the possible explanations are the optimal 
alignments of these sequences with respect to the various choices of parameters. In summary, the vertices 
of the Newton polytope A'^/'(/ai,c2) con^espond to the optimal alignments. If the observed sequences 01,02 
are not fixed then we are in the situation of fTFl Proposition 6]. Each parameter choice defines a function 
from pairs of sequences to alignments: 

{0, . . . , Z - 1 }" X {0, . . . , Z - 1}'" ^ Ji„,„ , (ai , 02) ^ h. 

The number of such functions grows doubly-exponentially in n and m, but only a tiny fraction of them are 
inference functions, which means they con^espond to the vertices of the Newton polytope of the map /. It is 
an interesting combinatorial problem to characterize the inference functions for sequence alignment. 

An important observation is that our formulation in Problem 2 is equivalent to combinatorial "scoring 
schemes" or "generalized edit distances" which can be used to assign weights to alignments ||3l- For exam- 
ple, the simplest scoring scheme consists of two parameters: a mismatch score mis, and an indel score gap 
||5l[ni24l. The weight of an alignment is the sum of the scores for all positions in the alignment, where a 
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Table 1: Alignments for a pair of sequences of length 2 and 3. 
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match is assigned a score of 1. This is equivalent to specializing the logaiithmic pai^ameters U = — log(5') 
and V = — log(r) of the pair hidden Markov model as follows: 

Uij = 0, VMij = 1 if / = 7, VMij = mis if / ^ j, and V[j = voi = gap for all /, j. (5) 

This specialization of the parameters corresponds to intersecting the normal fan of the Newton polytope 
with a two-dimensional affine subspace (whose coordinates are called mis and gap). 

Efficient software for parametrically aligning the sequences with two free parameters already exists 
(XPARAL f\T\). Consider the example of the following two sequences: =AGGACCGATTACAGTTCAA 
and = TTCCTAGGTTAAACCTCATGCA. XPARAL will return four cones, however a computation of 
the Newton polytope reveals seven vertices (three correspond to positive mis or gap values). The polytope 
propagation algorithm has the same running time as XPARAL: for two sequences of length n,m, the method 
requires 0{nm) two-dimensional convex hull computations. The number of points in each computation is 
bounded by the total number of points in the final convex hull (or equivalently the number, K, of expla- 
nations). Each convex hull computation therefore requires at most 0{K\og{K)) operations, thus giving an 
0{nmK\og{K)) algorithm for solving the parametric alignment problem. However, this running time can 
be improved by observing that the convex hull computations that need to be earned out have a very special 
form, namely in each step of the algorithm we need to compute the convex hull of two superimposed convex 
polygons. This procedure is in fact a primitive of the divide and conquer approach to convex hull computa- 
tion, and there is a well known 0{K) algorithm for solving it fT9l §3.3.5]. Therefore, for two parameters, our 
recursive approach to solving the parametric problem yields an 0{Kmn) algorithm, matching the running 
time of XPARAL and the conjecture of Waterman, Eggert and Lander 12411 . 

In order to demonstrate the practicality of our approach for higher-dimensional problems, we imple- 
mented a four parameter recursive parametric alignment solver. The more general alignment model includes 
different transition/transversion parameters (instead of just one mismatch parameter), and separate parame- 
ters for opening gaps and extending gaps. A transition is mutation from one purine (A or G) to another, or 
from one pyrimidine (C or T) to another, and a transversion is a mutation from a purine to a pyrimidine or 
vice versa. More precisely, if we let Pi, = {A, G} and Py = {C, T} the model is: 

umm = UlM = udm = 

umi = umd = gapopen 

uii = udd = gapextend 

VMij = 1 if = 7 

VMij = transt if / / j, and ij G P„ or i,] G Py 

VMij = transv if / / j, and i ePuJ ^ Py or vice versa 

vjj = vj)i = for all /, j. 

For the two sequences a' and in the example above, the number of vertices of the four dimensional 
Newton polytope (shown in Figure 4) is 224 (to be compared to 7 for the two parameter case). 



5 Practical Aspects of Parametric Inference 

We begin by pointing out that parametric inference is useful for Bayesian computations. Consider the 
problem where we have a prior distribution ^{s) on our parameters s = {s\,.. . ,Sd), and we would like to 
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Figure 4: Edge graph of the Newton polytope for a four parameter alignment problem. 

compute the posterior probability of a maximum a posteriori explanation h: 

Prob(X = h|Y = a) = J Pmh{X = h\Y = a, su...,sj)K{s)ds. (6) 

This is an important problem, since it can give a quantitative assessment of the validity of h in a setting 
where we have prior, but not certain, information about the parameters, and also because we may want to 
sample h according to its posterior distribution (for an example of how this can be applied in computational 
biology see El)- Unfortunately, these integrals may be difficult to compute. We propose the following 
simple Monte Carlo algorithm for computing a numerical approximation to the integral 

Proposition 7. Select N parameter vectors s^^\... ,s^^^ according to the distribution Tl{s), where N is much 
larger than the number of vertices of the Newton polytope NP{fa)- Let K be the number of s^'^ such that 
—log{s^'^) lies in the normal cone ofNP{fa) indexed by the explanation h. Then K/N approximates @. 

Proof. The expression Prob(X = h | Y = a, 5i, . . . ) is zero or one depending on whether the vector 
— log(5') = (— log(5'i), . . . , —\og{sd)) lies in the normal cone of NP{fc) indexed by h. This membership test 
can be done without ever running the sum-product algorithm if we precompute an inequality representation 
of the normal cones. □ 



The bound on the number of vertices of the Newton polytope in JTSl §4] provides a valuable tool for 
estimating the quality of this Monte Carlo approximation. We believe that the tropical geometry developed 
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in (TE\ will also be useful for more refined analytical approaches to Bayesian integrals. The study of Newton 
poly topes can also complement the algebraic geometry approach to model selection proposed in EUll . 

Another application of parametric inference is to problems where the number of parameters may be very 
large, but where we want to fix a large subset of them, thereby reducing the dimensions of the polytopes. 
Gene finding models, for example, may have up to thousands of parameters and input sequences can be 
millions of base pairs long however, we are usually only interested in studying the dependence of inference 
on a select few. Although specializing parameters reduces the dimension of the parameter space, the expla- 
nations correspond to vertices of a regular subdivision of the Newton polytope, rather than just to the vertices 
of the polytope itself. This is explained below (readers may also want to refer to fTSll for more background). 

Consider a graphical model with parameters si,...,Sd of which the parameters si,...,Sr are free but 
Sr+i = S,-+i,. .. ,Sd = Sci where the Si are fixed non-negative numbers. Then the coordinate polynomials fc 
of our model specialize to polynomials in r unknowns whose coefficients Ca are non-negative numbers: 

fa{si, . . . ,Sr) = f(,{si, . . . ,Sr,Sr+\, . ■ . ,Sd) = ^ Ca ■ s"y ■ ■ ■ s"' . 

The support of this polynomial is the finite set .^^ = {a € N'' : q > 0}. The convex hull of Acs in R' is 
the Newton polytope of the polynomial fc = fd^i, • ■ ■ ,^r)- For example, in the case of the hidden Markov 
model with output parameters specialized, the Newton polytope of /o is the polytope associated with a 
Markov chain. Kuo (TSi shows that the size of these polytopes does not depend on the length of the chain. 

Let h be any explanation for o in the original model and let {ui,...,Ur, w^+i , ■ ■■ ,Un) be the vertex of 
the Newton polytope of /o corresponding to that explanation. We abbreviate ai, = ("i, • • • :Ur) and = 
S"''_^[ ■ ■ -Sy . The assignment h aj, defines a map from the set of explanations of o to the support ^c- The 
convex hull of the image coincides with the Newton polytope of /o. We define 

Wa = min{ — log(5'h) : h is an explanation for a with ah = a^. (7) 

If the specialization is sufficiently generic then this maximum is attained uniquely, and, for simplicity, we 
will assume that this is the case. If a point a G Jlo is not the image of any explanation h then we set Wa = °°- 
The assignment a i-^^ is a real valued function on the support of our polynomial /o, and it defines a 
regular polyhedral subdivision of the Newton polytope NP{fa)- Namely, A^, is the polyhedral complex 
consisting of all lower faces of the polytope gotten by taking the convex hull of the points {a,Wa) in R''+^ 
See ll22l for details on regular triangulations and regular polyhedral subdivisions. 

Theorem 8. The explanations for the observation o in the specialized model are in bijection with the vertices 
of the regular polyhedral subdivision A„, of the Newton polytope of the specialized polynomial /o. 

Proof. The point (a, w^) is a vertex of A^. if and only if the following open polyhedron is non-empty: 

Pa = {v £ R'' : a ■ V + Wa < a' ■ V + Wa' for all a € A(;\{a} } . 

If V is a point in Pa then we set Si = exp(— v,) for / = 1, . . . ,r, and we consider the explanation h which 
attains the minimum in Q. Now all parameters have been specialized and h is the solution to Problem 2. 
This argument is reversible: any explanation for a in the specialized model arises from one of the non-empty 
polyhedra P„. We note that the collection of polyhedra Pa defines a polyhedral subdivision of R'" which is 
geometrically dual to the subdivision A^. of the Newton polytope of f^. □ 

In practical applications of parametric inference, it may be of interest to compute only one normal cone 
of the Newton polytope (for example the cone containing some fixed parameters). We conclude this section 
by observing that the polytope propagation algorithm is suitable for this computation as well: 
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Proposition 9. Let v be a vertex of a d-dimensional Newton poly tope of a hidden Markov model. Then the 
normal cone containing v can be computed using a poly tope propagation algorithm in dimension d —\. 

Proof. We run the standard polytope propagation algorithm described in Section 4, but at each step we 
record only the minimizing vertex in the direction of the log parameters, together with its neighboring 
vertices in the edge graph of the Newton polytope. It follows, by induction, that given this information at 
the nth step, we can use it to find the minimizing vertices and related neighbors in the (« + l)st step. □ 

6 Summary 

We envision a number of biological applications for the polytope propagation algorithm, including: 

• Full parametric inference using the normal fan of the Newton polytope of an observation when the 
graphical model under consideration has only few model parameters. 

• Utilization of the edge graph of the polytope to identify stable parts of alignments and annotations. 

• Construction of the normal cone containing a specific parameter vector when computation of the full 
Newton polytope is infeasible. 

• Computation of the posterior probability (in the sense of Bayesian statistics) of an alignment or anno- 
tation. The regions for the relevant integrations are the normal cones of the Newton polytope. 

As we have seen, the computation of Newton polytopes for (interesting) graphical models is certainly 
feasible for a few free parameters, and we expect that further analysis of the computational geometry should 
yield efficient algorithms in higher dimensions. For example, the key operation, computation of convex hulls 
of unions of convex polytopes, is likely to be considerably easier than general convex hull computations 
even in high dimensions. Fukuda, Liebling and Liitlof Q give a polynomial time algorithm for computing 
extended convex hulls (convex hulls of unions of convex polytopes) under the assumption that the polytopes 
are in general position. Furthermore, it should be possible to optimize the geometric algorithms for specific 
models of interest, and combinatorial analysis of the Newton polytopes arising in graphical models should 
yield better complexity estimates (see, e.g., ||5l [O)- Michael Joswig is currently working on a general 
polytope propagation implementation in POLYMAKE l9l lT0l . 

In the case where computation of the Newton polytope is impractical, it is still possible to identify the 
cone containing a specific parameter, and this can be used to quantitatively measure the robustness of the 
inference. Parameters near a boundary are unlikely to lead to biologically meaningful results. Furthermore, 
the edge graph can be used to identify common regions in the explanations con^esponding to adjacent ver- 
tices. In the case of alignment, biologists might see a collection of alignments rather than just one optimal 
one, with common sub-alignments highlighted. This is quite different from returning the k best alignments, 
since suboptimal alignments may not be vertices of the Newton polytope. The solution we propose explicitly 
identifies all suboptimal alignments that can result from similar- pai^ameter choices. 
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